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ABSTRACT 

Cyber  Red  Teaming  (CRT)  is  an  important  exercise  to  conduct  for  Defence  agencies 
built  on  large  technological  infrastructures.  Their  size  and  relative  importance  may 
make  them  high  priority  targets  for  criminal  organizations,  issue-motivated  groups 
and  even  foreign  governments  that  are  increasingly  capable  and  willing  to  use 
technology  for  intelligence  gathering.  However,  identifying  a  viable  attack  can  be  a 
time-consuming  process  for  human  analysts,  and  so  Automated  Planners  are  being 
considered  as  a  viable  method  of  discovering  possible  attack  paths  for  CRT. 

This  report  surveys  the  current  state-of-the-art  planning  techniques,  tools  and 
frameworks,  their  performance  at  international  competitions,  and  by  comparing  their 
performance  against  the  operational  requirements  and  limitations  of  CRT  problems, 
recommend  the  most  suitable  ones  for  trialling. 


RELEASE  LIMITATION 


Approved  for  public  release 


UNCLASSIFIED 


UNCLASSIFIED 


Published  by 

Cyber  and  Electronic  Warfare  Division 

DSTO  Defence  Science  and  Technology  Organisation 

PO  Box  1500 

Edinburgh  South  Australia  5111  Australia 

Telephone:  1300  333  363 
Fax:  (08)  7389  6567 

©  Commonwealth  of  Australia  2013 

AR-016-282 

April  2015 


APPROVED  FOR  PUBLIC  RELEASE 


UNCLASSIFIED 


UNCLASSIFIED 


Automated  Cyber  Red  Teaming 


Executive  Summary 

Cyber  Red  Teaming  (CRT)  is  a  common  activity  performed  within  large  organisations 
to  assess  how  susceptible  their  infrastructure,  business  processes  and  staff  are  to 
attacks  from  cyber-enabled  adversaries.  CRT  involves  drafting  attack  plans  that  could 
succeed  on  the  current  state  of  the  organisation,  optional  attack  execution/  simulation, 
and  impact  analysis.  The  results  are  then  used  to  develop  mitigation  strategies  and 
countermeasures.  As  the  attack  plan  drafting  step  can  be  a  time-consuming  process, 
the  use  of  Automated  Planners,  Artificial  Intelligence  algorithms  that  generate 
problem-specific  plans,  is  suggested  to  help  reduce  the  cost  of  the  overall  exercise. 

There  are  3  major  categories  of  Automated  Planners:  state-space  planners,  planning 
graph  planners  and  hierarchical  task  network-based  planners.  State-space  planning  is 
in  essence  classical  path-finding  algorithms  like  breadth  first  search  and  A*,  with 
added  heuristics  for  more  informed  exploration.  Planning  graph  planners  converts  a 
planning  problem  into  planning  graphs:  data  structure  that  compactly  represents  all 
"possible  futures"  for  a  given  problem  in  stepped  layers,  and  explores  this  graph  to 
identify  the  earliest  layer  where  a  goal  satisfying  state  is  found.  Hierarchical  task 
network  techniques  use  expert-designed  plan  templates  to  constrain  the  search  to 
viable  strategies,  and  focuses  on  searching  for  multiple  plans  within  that  range. 

Other  planning  approaches  exist,  but  they  are  not  competitive  with  the  three  already 
mentioned  in  terms  of  efficiency  and  success.  Additional  techniques  can  also  be 
applied  on  top  of  the  basic  planning  algorithms,  such  as  use  of  machine  learning  to 
guide  the  planning  and  factoring  in  uncertainty. 

Using  the  benchmarking  results  from  the  International  Planning  Competition,  we 
identified  that  algorithms  and  tools  that  use  planning  graphs  are  currently  best  suited 
to  the  Defence  Cyber  context,  as  they  scale  better  computationally  for  larger  scenarios 
and  generating  longer  attack  plans  within  reasonable  time. 

In  particular.  Portfolio-based  Planning  (PbP),  a  parallel  computing  framework  which 
utilises  a  library  of  planning  techniques  and  learns  suitable  portfolio  configurations  for 
each  problem,  has  proven  to  be  a  promising  off-the-shelf  tool  for  planning.  We 
conclude  that  future  CRT  exercises  should  consider  trialling  PbP. 
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Glossary 

AI:  Artificial  Intelligence 

CRT:  Cyber  Red  Teaming 

DDoS:  Distributed  Denial  of  Service 

DFH:  Delete-Free  Heuristics 

FD:  Fast  Downward 

HTN:  Hierarchical  Task  Network 

ICAPS:  International  Conference  on  Automated  Planning  and  Scheduling 
IP:  Internet  Protocol 

IPC:  International  Planning  Competition 

MDP:  Markov  Decision  Process 

NAT:  Network  Address  Translation 

NPC:  Non-Player  Character 

PbP:  Portfolio-based  Planning 

PDDL:  Planning  Domain  Description  Language 

POMDP:  Partially  Observable  Markov  Decision  Process 

SAT:  Satisfiability 

SQL:  Structured  Query  Language 

STRIPS:  Stanford  Research  Institute  Problem  Solver 

TTP:  Tactics,  Techniques  and  Procedures 
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1.  Introduction 


Cyber  Red  Teaming  (CRT)  is  an  important  exercise  to  conduct  for  Defence  agencies  built 
on  large  technological  infrastructures  [1],  Their  size  and  relative  importance  may  make 
them  high  priority  targets  for  criminal  organizations,  issue-motivated  groups  and  even 
foreign  governments  that  are  increasingly  capable  and  willing  to  use  technology  for 
intelligence  gathering. 

Stated  simply,  CRT  exercises  determine  the  vulnerabilities  that  affect  one's  cyber  system 
by  searching  for  viable  attack  plans1,  and  examining  its  effect  on  the  system.  It  is  a  labour- 
intensive  exercise  as  it  typically  involves  specialist  human  analysts  and  operators  to  draft 
attack  plans.  Due  to  the  dynamic  nature  of  cyber  environments,  some  findings  from  these 
exercises  can  quickly  become  invalid,  which  means  they  should  be  conducted  frequently. 
Automation  of  the  exercise  would  allow  an  organization  to  discover  potential 
vulnerabilities  more  cost  effectively,  which  in  turn  will  permit  more  resources  to  be 
allocated  towards  mitigation  and  countermeasures. 

This  technical  note  introduces  options  available  for  automating  CRT,  specifically  through 
the  application  of  automated  planning  techniques.  It  is  intended  to  provoke  thoughts  for 
people  wanting  to  use  automated  planners  for  CRT.  Readers  of  this  note  are  expected  to 
have  some  background  in  computer  science,  and  exposure  to  general  artificial  intelligence 
concepts  [2]  will  be  beneficial.  Familiarity  with  automated  planning  is  not  necessary. 

The  rest  of  the  paper  is  organized  as  follows:  we  discuss  what  the  CRT  problem  is,  and 
provide  guiding  principles  for  modelling  CRT  scenarios  into  planning  problems.  We  then 
introduce  what  automated  planning  is,  discuss  several  modern  approaches,  and  consider 
the  applicability  of  automated  planning  to  CRT  problems.  Finally,  we  recommend  several 
state-of-the-art  planning  tools  for  trial  and,  more  generally,  when  it  is  suitable  to  use 
automated  planners  in  support  of  CRT  exercises  based  on  the  current  requirements. 

Details  of  the  implementation  and  complexity  analysis  of  various  planning  algorithms, 
tools  and  frameworks  will  not  be  discussed  here.  It  is  best  to  consider  this  note  as  a  pointer 
to  other  literature  that  may  be  more  relevant  in  specific  cases  for  which  an  extensive  set  of 
references  is  included  (see  Appendix  A). 


1  For  the  purposes  of  this  literature  review,  an  attack  plan  is  defined  as  a  sequence  of  actions  which, 
if  taken  by  a  person  and /or  a  computer,  could  harm  the  target  organization. 
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2.  Cyber  Red  Teaming 

CRT  is  a  term  often  used  interchangeably,  though  sometimes  inaccurately,  with 
penetration  testing  and  vulnerability  assessment.  While  CRT  is  an  exercise  in  finding 
possible  vectors  for  attack,  penetration  testing  is  an  exercise  in  actually  attacking  the 
system.  Vulnerability  assessment  on  the  other  hand  is  about  analysing  software  and 
exposing  coding  flaws  which  can  be  exploited. 

Vulnerability  assessment  is  conceptually  similar  to  CRT  but  studies  mostly  individual 
software.  It  lacks  the  broader  view  of  the  system  as  a  whole,  focusing  more  on  code  flaws 
and  less  on  system  configuration  and  business  processes  [3].  And  while  the  outcome  of 
penetration  testing  has  the  same  practical  implications  on  a  system  as  CRT,  the  attack 
vectors  are  very  narrow  and  often  doesn't  say  much  about  the  system  overall. 

This  section  discusses  modelling  of  an  adversary's  characteristics  and  behaviours  (red 
teaming),  modelling  cyber  infrastructure  from  a  systemic  perspective,  attack  plan 
construction  via  simulation,  and  the  issues  related  to  conducting  CRT.  This  will  help  show 
how  CRT  involves  aspects  of  both  penetration  testing  and  vulnerability  assessment,  but  is 
able  to  draft  attack  plans  that  utilise  multiple  vulnerabilities  across  the  system  rather  than 
isolated  ones. 


2.1  The  World  Model 

In  CRT  terms,  the  overall  system  that  is  being  red-teamed  is  commonly  referred  to  as  the 
World  Model  [4]  [5].  This  naming  captures  the  idea  that  cyber  systems  are  large,  complex 
digital  ecosystems  with  many  intelligent  entities  sharing  and  consuming  resources.  It  also 
alludes  to  the  practice  of  modelling  and  simulating  attacks  in  a  test  environment  as 
opposed  to  running  the  exercise  in  a  live,  production  environment.  For  our  purposes,  we 
divide  the  World  Model  into  two  parts:  the  adversary  and  the  environment  they  target. 


The  Adversary 

Not  all  adversaries  are  equal.  Each  adversary  has  a  specific  set  of  Tactics,  Techniques  and 
Procedures  (TTPs);  some  are  more  resourceful  and  better  resourced  than  others.  Others 
may  have  very  specific  objectives  when  attacking  an  organization.  Below  is  a  non- 
exhaustive  set  of  questions  regarding  an  adversary  one  could  and  should  ask  in 
constructing  a  Red  Teaming  agent  that  represents  them: 

•  The  adversary's  target:  who  or  what  are  they  after?  What  access  do  they  have  into 
various  parts  of  the  system? 

•  The  adversary's  offensive  capabilities:  this  includes  their  TTPs,  computational 
resources  and  domain  knowledge. 

•  The  adversary's  restrictions:  limited  time  windows  for  attacking,  anonymity, 
visibility  of  the  network  etc. 
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•  The  adversary's  behaviour  patterns:  have  they  attacked  before?  Previous  attack 
patterns  and  targets  may  be  indicative  of  future  ones. 

Accurately  representing  an  adversary  in  a  CRT  exercise  will  make  the  proposed  attack 
plans  more  relevant,  and  also  affect  the  usefulness  and  reliability  of  the  results  when  used 
to  assess  the  actual  system. 


The  Target  Environment 

A  real  computer  network  for  an  organization  such  as  the  US  Department  of  Defence  is 
generally  very  large,  dynamic  and  complex  [6]  .  Accurately  modelling  and  simulating  such 
a  network  for  the  purposes  of  CRT  is  the  responsibility  of  the  exercise  creator,  and  may 
require  applying  abstractions  or  assumptions.  Below  are  some  guiding  questions  in 
support  of  building  a  problem-specific  World  Model: 

•  What  entities  are  there  in  the  World  Model?  This  may  include  computers,  users, 
software,  routers,  encrypted  storage,  network  policies  and  more. 

•  What  are  the  relationships  between  entities?  Examples  include  "User  A  has  an 
account  on  Computer  B"  and  "Computer  X  has  Software  Y  installed". 

•  What  are  the  World's  dynamics?  Some  system  behaviour  occurs  independently 
from  the  adversary's  actions. 

•  Which  parts  of  the  World  are  relevant?  It  is  better  to  have  concise  system 
representation  that  pertains  to  the  adversary's  target  to  reduce  unnecessary 
exploration  [7], 

•  Which  parts  of  the  World  are  visible?  Not  all  aspects  of  the  system,  even  those 
that  are  relevant,  may  be  visible  to  an  adversary,  even  from  another  part  of  the 
system. 


2.2  Attack  Plan  Generation 

After  describing  a  World  Model,  attack  plans  can  then  be  drafted  in  accordance  with  an 
adversary's  TTP  set  through  simulated  execution  of  the  plan  on  the  model.  Each  attack 
plan  may  include  general  coverage  activities  such  as  port  scanning  and  IP  ranging,  or 
targeted  actions  such  as  sending  a  spear  phishing  email.  Some  attacks  may  also  depend  on 
specific  responses  from  the  target  machine  or  user.  Through  simulation,  damage 
assessment  and  mitigation  planning  based  on  the  attack  effects  can  be  estimated. 

There  are  often  numerous  possible  attack  plans  conceived  during  a  CRT  exercise.  The  red 
team  (exercise  runners  playing  the  role  of  an  adversary)  chooses  which  of  these  attacks  to 
attempt  first  using  one  or  more  of  the  following  factors: 

•  Concerns-based:  prioritise  attacks  that  are  most  concerning  (to  the  organisation) 

•  Success-based:  prioritise  attacks  that  are  most  likely  to  succeed 

•  Cost-based:  prioritise  attacks  that  consume  the  least  resources 

•  Impact-based:  prioritise  attacks  that  are  most  damaging  if  successful 
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•  Opportunity-based:  prioritise  attacks  that  are  relevant  to  certain  situations 

•  Verification-based:  prioritise  attacks  that  have  been  dealt  with  before.  This  may  be 
to  verify  that  the  protections/countermeasures  already  put  in  place  are  working  as 
expected. 

The  level  of  plan  abstraction  is  another  consideration  when  preparing  for  a  CRT  exercise. 
If  the  exercise  is  only  a  thought  experiment,  a  plan  describing  attack  patterns  may  suffice, 
whereas  an  executable  attack  plan  will  require  step-by-step  details.  Regardless,  selecting 
the  most  suitable  approach  will  help  ensure  that  the  red  teaming  exercises  conducted 
meets  the  priorities  of  the  organization  they  are  conducted  for. 


2.3  Issues  and  Challenges 

Planning  and  conducting  a  CRT  exercise  may  face  a  variety  of  practical  challenges,  some 
of  which  cannot  be  remedied  and  may  require  changes  to  the  exercise: 

•  Limited  Resources:  the  time  and  computational  cost  to  conduct  certain  exercises 
may  be  infeasible,  such  as  the  data-mining  needed  to  conduct  spear  phishing, 
simulating  a  DDoS  attack  etc. 

•  Asymmetric  Threat:  adversaries  may  have  TTPs  that  are  beyond  an  organization's 
own  capabilities,  thus  limiting  its  ability  to  detect  or  defend  such  attacks. 

•  Reactivity:  As  some  forms  of  cyber-attacks  occur  and  are  completed  in  a  matter  of 
milliseconds,  the  situation  assessment  may  need  to  be  done  in  near  real-time, 
online,  continuously,  within  the  production  environment,  and  with  no  human-in- 
the-loop.  This  means  the  CRT  exercise  will  have  to  employ  computationally  fast 
techniques  which  may  entail  loss  of  precision,  if  such  techniques  exist  at  all. 

•  Model  Complexity:  As  computer  network  size  grows,  so  does  its  complexity  [8], 
Because  of  this,  modelling  a  large  computer  network  realistically  may  prove  to  be  a 
difficult  or  even  impossible  undertaking. 

•  Model  Incompleteness:  Having  no  known  vulnerabilities  doesn't  mean  there  are 
no  vulnerabilities.  Modelling  an  adversary  requires  detailed  and  current 
intelligence  on  them,  which  may  not  always  be  available.  In  such  cases  CRT  is  only 
a  best  effort  to  replicate  the  real  threat  or  the  real  environment. 

These  are  challenges  that  limit  what  conclusions  can  be  made  from  CRT  exercises,  and  it  is 
the  objective  of  continued  research  and  technology  improvement  to  improve  efficiency 
and  automation  of  operating  under  practical  limitations. 
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3.  Automated  Planning 

Automated  planning  is  a  branch  of  Artificial  Intelligence  (AI)  that  is  concerned  with 
generation  of  plans  [9] .  The  planner  is  tasked  with  answering  one  question:  given  a  set  of 
possible  actions,  an  initial  state,  and  some  goals,  can  a  sequence  of  these  actions  be  found 
such  that  their  execution  will  transition  the  system  from  the  initial  state  into  a  goal  state? 

Automated  planners  are  most  popularly  used  in  logistics  [10]  [11],  scheduling  [12], 
robotics  [13]  and  computer  game  engines  [14],  Most  automated  planners  are  general 
purpose  and  can  be  used  to  solve  planning  problems  in  a  variety  of  domains  [15], 
However,  the  performance  of  different  planning  tools  and  techniques  can  vary  depending 
on  the  planning  problem  itself  [16], 

The  remainder  of  this  section  discusses  these  characteristics  with  respect  to  CRT,  and  how 
modern  planning  techniques  can  be  used  to  solve  CRT  problems.  Nau  [9]  provides  more 
details  regarding  related  theory  and  technique  implementation. 


3.1  The  Planning  Problem 

A  planning  problem  has  an  initial  state  of  a  system,  and  by  performing  a  sequence  of 
actions,  a  goal  state  can  be  reached.  Each  planning  problem  is  encoded  for  a  specific 
domain  (e.g.  airport  logistics),  which  may  have  specific  types  of  objects  (e.g.  flights)  and 
propositions  (e.g.  passenger  P  is  checked  in  on  flight  QF123)  not  present  in  other  domains. 
Some  planning  techniques  are  able  to  leverage  specific  traits  of  specific  domains. 

The  Stanford  Research  Institute  Problem  Solver  (STRIPS)  semantics  [17]  and  the  Planning 
Domain  Description  Language  (PDDL)  [18]  are  the  two  most  popular  input  languages  for 
defining  the  problem  state  as  well  as  the  available  library  of  actions.  STRIPS  has  been 
around  for  much  longer  than  PDDL,  and  is  more  prevalent  in  current  industrial  planning 
tools,  while  PDDL  is  newer,  and  was  created  primarily  for  benchmarking  purposes. 

In  both  planning  languages,  a  state  is  represented  by  a  discrete  set  of  observable,  first-class 
entities,  referred  to  as  objects2.  Facts  about  these  objects  are  referred  to  as  the 
propositions3.  Examples  of  objects  in  CRT  include  host  machines,  users,  software,  websites 
and  services,  and  a  proposition  may  be  something  like  "user  X  has  an  admin  account  on 
host  machine  Y". 

An  action  is  defined  by  the  following: 

•  Preconditions:  the  propositions  that  must  be  true  in  order  to  perform  this  action 

•  Add  Effects:  objects  and  propositions  that  are  introduced  into  the  state  by  taking 
this  action 


2  Some  literature  on  planning  also  uses  the  term  "instances" 

3  The  term  "predicates"  is  sometimes  used  as  well 
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•  Delete  Effects:  objects  and  prepositions  that  are  removed  from  the  state  by  taking 
this  action 

•  Costs:  non-boolean,  qualitative  values  of  the  state  affected  by  this  action. 

In  terms  of  CRT,  the  planning  problem  is  to  draft  an  attack  plan.  All  entities  and 
relationships  within  the  World  Model,  including  where  the  adversary  sits  on  the  network, 
would  form  the  initial  state  of  the  system.  The  goal  state  will  contain  the  changes  to  the 
World  Model  that  meets  the  adversary's  objective.  The  actions  are  the  adversary's  TTPs. 


3.2  Planning  Domain  Characteristics 

Not  all  planning  problems  are  the  same.  Planning  the  drive  to  work  and  planning  a 
winning  chess  strategy  not  only  requires  different  sets  of  actions,  but  the  environments  in 
which  the  problems  reside  are  also  different.  It  is  important  to  understand  these 
differences,  as  specific  techniques  may  be  more  suited  to  specific  domains. 

According  to  Nau  [9]  and  Russell  [2],  a  planning  domain  is  characterised  by  the  following: 

•  Observability:  a  system  is  fully  observable  if  every  object  and  preposition  of  its 
current  state  is  known  to  the  planner.  Otherwise  it  is  considered  only  partially 
observable.  In  CRT,  this  observability  relates  to  the  visibility  of  a  network 
environment  from  the  perspective  of  the  adversary. 

•  Determinism:  are  the  effects  of  agent  actions  on  the  state  predictable? 

•  Dynamics:  does  the  state  of  the  system  change  independent  of  agent  action/ plan 
execution? 

•  Temporality:  does  the  time  taken  to  complete  actions  matter? 

•  Granularity:  are  action  decisions,  effects  and  costs  discrete  or  continuous  values? 

In  some  instances,  there  may  also  be  problem-specific  requirements: 

•  Library:  what  actions  are  available  to  the  planner  for  a  particular  scenario? 

•  Optimality:  do  we  want  lowest  cost  plan  (optimal)  or  just  any  valid  plan 
(satisficing)? 

•  Ordering:  are  we  generating  totally  ordered  or  partially  ordered  plans?  Partially 
ordered  plans  allow  for  contingencies  where  actions  are  non-deterministic,  or 
where  the  system  is  dynamic. 

•  Preferences:  costs  associated  with  a  plan  may  be  relevant,  which  affect  the  actions 
we  prefer. 

•  Extended  goals  and  constraints:  are  there  certain  states  we  don't  want  to  pass 
through  during  plan  execution?  In  other  words,  do  we  need  to  ensure  parts  of  the 
system  are  unaffected  by  our  attack? 

•  Performance:  does  the  planning  need  to  be  done  in  real-time  or  is  doing  it  offline 
acceptable? 

It  is  critical  to  select  a  planning  technique  suitable  to  the  planning  problem's  domain,  but 
care  should  also  be  taken  to  avoid  over-estimating  these  requirements.  For  instance,  the 
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computational  effort  required  to  guarantee  that  a  plan  is  optimal  can  exceed  the  effort  to 
reach  a  sub-optimal  alternative  by  several  orders  of  magnitude  [19].  If  the  optimal  plan  is 
not  required,  we  don't  need  to  select  an  algorithm  with  such  a  capability,  as  it  may  conflict 
with  other  requirements  such  as  reactivity  for  real-time,  operational  use. 

The  most  common  type  of  domain  tested  in  academia  is  a  planning  domain  that  is  fully 
observable,  deterministic  and  has  a  static  system  [20].  This  is  referred  to  as  the  classical 
planning  problem.  There  is  a  view  within  the  automated  planning  community  that,  with 
some  model-mapping  work,  the  currently  more  efficient  and  reliable  classical  planning 
techniques  can  be  used  to  solve  non-classical  planning  problems  [21],  but  in  many  real- 
world  domains  performing  such  a  mapping  is  hard.  As  such,  some  level  of  abstraction  is 
needed  when  using  these  planners  in  practice. 


3,3  State  of  the  art  Automated  Planning 

Historically,  most  planning  techniques  and  algorithms  were  designed  with  the  computing 
power  of  their  time  as  a  feasibility  constraint  [22].  As  such  they  were  seldom  scalable 
solutions,  and  found  little  audience  outside  of  academia.  This  is  until  DARPA  ran  a  special 
workshop  in  1990  [23],  which  pushed  researchers  away  from  worrying  about  computing 
power,  and  more  towards  developing  scalable  techniques  and  real-world  applications. 

CRT  exercises  are  typically  large  scale,  dynamic  planning  problems  in  partially  observable 
environments  [24]  [25],  which  make  many  of  the  older  planners  unsuitable  as  they  don't 
take  advantage  of  the  increased  computational  power  available  today. 

This  section  explores  a  variety  of  approaches  that  are  used  by  modern  automated  planning 
tools.  The  goal  is  to  convey  a  basic  understanding  of  the  approaches  and  their  associated 
strengths  and  weaknesses.  Appendix  A  contains  a  table  comparing  implementations  of  the 
various  planning  techniques,  and  can  be  used  as  a  catalogue  for  exploring  planning  tools 
beyond  the  ones  recommended  in  this  report. 


3.3.1  State-Space  Planning 

State-Space  Planners  [9]  model  the  planning  problem  as  a  directed  graph  where  nodes 
represent  possible  states  the  system  can  be  in,  and  arcs/ edges  are  the  actions  that  move 
the  system  from  one  state  to  another.  Once  the  graph  is  constructed,  it  becomes  a  path¬ 
finding  problem  to  generate  the  plan.  Existing  AI  graph  search  algorithms  such  as  Iterative 
Deepening  A*  Search  [26]  can  be  leveraged  for  this  step  of  the  planning.  The  final  plan  is 
represented  by  the  path  between  the  initial  state  node  and  the  goal  state  node.  Figure  1 
shows  part  of  a  state-space  graph. 
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Figure  1  -  Example  of  one  expansion  step  of  the  State-Space  for  a  block  stacking  problem. 

The  state-space  planning  approach  is  simple  and  has  been  shown  to  be  very  effective  in 
planning  for  many  domains  [9].  As  such,  it  is  one  of  the  most  popular  and  enduring 
techniques  for  real-world  planning  problems.  Many  implementations  of  general  purpose 
state-space  planners  exist  [27]  [28],  and  solutions  engineered  towards  specific  problem 
domains  have  also  been  developed  [29]  [10]. 

State-space  planners  are  most  effective  for  problems  with  fully  observable,  deterministic 
environments,  and  handle  preferences  and  constraints  well  during  plan  generation  [15]. 
Most  have  the  option  of  providing  completeness  and  optimality  guarantees,  and  can  be 
highly  engineered  by  using  domain-specific  heuristics  to  guide  the  search.  State-space 
graphs  scale  poorly  for  problems  in  dynamic  systems  and  non-deterministic  world  models 
due  to  state-space  explosion  inherent  in  the  branching  factor,  which  is  dependent  on  the 
number  of  valid  actions,  parameters  and  objects  at  any  given  step. 

The  majority  of  the  current  research  in  state-space  planning  is  focused  on  pruning  the 
search  space  for  planning  under  uncertainty  [30]  [31]  [32],  and  leveraging  the  relatively 
new  concept  of  Delete-Free  Heuristics  (DFH)  [33]  [34],  When  calculating  the  heuristic 
value  in  a  DFH,  the  planner  pretends  that  actions  have  no  delete  effects  during  the 
expansion  phase.  This  speeds  up  the  initial  search,  and  deletions  can  be  applied  later  for 
plan  validity  check. 

Many  state-space  planners  use  DFH,  but  Katz  and  Hoffman  points  out  in  [35]  that  this 
form  of  relaxation  actually  slows  the  search  when  planning  in  an  environment  with  non- 
replenishable  resources.  This  may  be  a  problem  for  CRT,  as  actions  may  involve  disabling 
of  network  nodes  or  services  as  part  of  a  larger  or  longer  attack,  which  means  use  of  DFH 
may  not  be  recommended. 


3.3.2  Planning  Graphs 

Planning  Graphs  are  data  structures  that  capture  both  the  set  of  possible  and  impossible 
states  a  system  can  be  in  by  keeping  track  of  mutexes:  pairs  of  observations  that  cannot  be 
true  at  the  same  time  [36].  For  instance,  an  airline  system  can  be  in  a  state  where  two  of  its 
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planes  are  in  the  same  airport,  but  it  cannot  be  in  a  state  where  one  plane  is  in  two 
different  airports  at  the  same  time. 

Structured  as  a  directed,  layered  graph,  nodes  in  a  Planning  Graph  are  either  an  action  or 
propositional  node,  belonging  to  an  action  or  propositional  layer  respectively.  The  graph 
grows  from  the  initial  state,  and  expands  into  the  next  layer  based  on  the  action  library. 

The  action  layer  represents  the  full  set  of  possible  actions  that  could  be  taken  at  a  given 
step,  while  the  propositional  layer  represents  the  set  of  states  the  system  could  be  in  after 
any  of  those  actions  are  taken.  Arcs  represent  either  the  action-effect  or  precondition- 
action  relationships  between  action  and  propositional  nodes. 

The  goal  state  is  reachable  when  we  arrive  at  a  propositional  layer  where  all  the  required 
propositions  are  true.  Then  the  flow  from  the  initial  state  to  this  layer  will  be  the  shortest 
partially-ordered  plan. 

Planning  Graphs  can  be  examined  directly  to  generate  the  plan,  and  are  often  integrated 
into  state-space  planners  such  as  Hoffmann's  FastForward  family  [28],  Planning  graphs 
are  much  smaller  in  size  than  equivalent  state-space  graphs;  they  only  grow  polynomial 
with  respect  to  search  depth.  However,  planning  graphs  are  ultimately  forward  search 
planners  [37],  As  such,  there  is  an  unavoidable  overhead  in  generating  a  planning  graph 
for  every  new  problem. 

Many  state-of-the-art  planners  make  use  of  planning  graphs  [36]  [38]  [39]  [40]  [41]  [42], 
because  they  allow  for  scalable  plan  generation  with  respect  to  plan  length  and  action 
library  size,  and  they  provide  implicit  cycle  detection  of  search  paths  since  each  layer  of 
the  graph  compactly  represents  all  possibilities  after  n  steps.  These  graphs  can  also  be 
deployed  for  dynamic  systems  to  generate  a  complete  representation  that  includes  action 
effects  on  all  the  possible  worlds  [27] .  However,  the  structure  can  be  hard  to  comprehend 
by  human  inspection  for  larger  problems,  and  it  remains  the  responsibility  of  the 
algorithm  to  understand  and  extract  the  plan. 


3.3.3  Hierarchical  Task  Networks 

Hierarchical  Task  Networks  are  another  data  structure  that  has  been  popular  in  applied 
automated  planning  [43]  [16].  Unlike  planning  graphs,  which  are  driven  by  the 
relationship  between  states,  HTNs  are  driven  by  the  relationship  between  actions.  Instead 
of  trying  to  reach  a  goal  state  through  graph  search,  HTNs  view  the  planning  problem  as 
trying  to  perform  a  task  which  can  be  decomposed  into  smaller  tasks,  where  the  subtasks 
at  the  atomic  level  are  called  "primitive  tasks"  or  "operators". 

A  primitive  task  has  preconditions  that  are  satisfied  by  binding  instances,  the  objects  in  the 
system,  to  them.  A  precondition  is  a  predicate  of  the  state.  Primitive  tasks  represent  the 
actions  that  change  the  system,  and  a  valid  plan  is  generated  when  all  primitive  tasks 
preconditions  inside  a  task  network  are  satisfied  through  valid  variable  bindings.  Some 
HTN  planners  also  allow  non-primitive  tasks  to  have  their  own  preconditions,  which  may 
reduce  the  number  of  precondition  tests  at  the  atomic  level. 


UNCLASSIFIED 


9 


UNCLASSIFIED 

DSTO-TN-1420 

Unlike  state-space  planning  and  planning  graphs,  where  only  the  basic  operators  need  to 
be  encoded,  HTNs  require  a  domain  expert  to  determine  and  encode  the  task  structures 
which  the  HTN  planners  can  then  use  [44],  In  doing  so,  many  dead-ends  plans  can  be 
avoided  since  every  plan  under  the  HTN  is  valid,  and  the  main  effort  becomes  finding  a 
set  of  bindings  that  is  relevant  given  the  initial  state.  The  bindings  do  not  necessarily  need 
to  be  grounded;  as  long  as  a  particular  set  of  variable  types  and  relationships  are  true  in 
the  initial  state  of  the  planning  problem,  plan  validity  can  be  inferred. 

Because  of  its  efficiency  compared  to  state-space  planners,  planning  with  HTN  is  by  far  the 
most  popular  technique  in  terms  of  industry  application,  having  been  used  in  robotics  [13] 
[45],  NPC  scripting  for  computer  games  [14]  [46],  manufacturing,  logistics  and  scheduling. 

The  effectiveness  and  comprehensiveness  of  HTNs  depend  on  the  human  encoding  the 
task  structures.  In  the  case  of  CRT,  HTNs  may  only  be  applicable  for  validation-based  and 
success-based  exercises,  where  the  attack  vectors  are  already  established.  HTNs  have  been 
used  for  CRT  in  a  Defence  context  [47], 


3.3.4  Machine  Learning 

While  Machine  Learning  theory  and  techniques  have  been  around  for  decades,  they  were 
only  adopted  by  the  automated  planning  community  in  the  90s  as  a  way  to  enhance  or 
improve  the  planning  process  [48],  Machine  learning  has  been  applied  to  automated 
planning  in  several  ways:  policy  learning,  parameter  tuning,  discovering  macro  actions 
and  portfolio  construction. 

•  Policy  Learning  [49]  is  about  identifying  potentially  conflicting  propositions  (like 
mutexes  from  Planning  Graphs),  and  developing  problem-specific  search  policies 
to  reduce  dead  ends.  Planning  problems  that  benefit  from  policy  learning  the  most 
are  ones  whose  state-space  contain  many  dead  ends  that  stem  from  a  small  number 
of  easily  reached  'bad'  states,  where  simply  avoiding  such  states  will  make 
planning  faster  and  more  successful.  An  example  in  the  airport  scheduling  domain 
would  be  to  create  a  policy  of  not  having  any  planes  airborne  for  more  than  24 
hours  consecutively,  which  will  wear  the  hardware  more  slowly. 

•  Parameter  Tuning  is  optimization  of  the  weights  and  invariants  associated  with 
search  heuristics  for  specific  problems  or  problem  classes.  For  example,  some 
actions  might  be  able  to  satisfy  sub-goals  that  are  difficult  to  reach,  and  achieving 
those  sub-goals  earlier  may  merit  heavier  weightings  in  the  cost  function.  This  in 
turn  will  allow  the  planner  to  favour  those  actions  in  the  plan. 

•  Macro  Action  Learning  [50]  focuses  on  discovering  and  maintaining  a  library  of 
useful  plan  structures  (partial  plans)  that  can  help  solve  larger,  harder  planning 
problems  in  a  specific  domain.  It  is  in  essence  automated  construction  of  partial 
HTNs.  Learning  macro  actions  requires  significant  upfront  training;  a  large  sample 
of  planning  problems  are  needed  to  determine  which  partial  structures  are 
generally  useful,  and  worth  keeping  in  the  macro  action  library. 
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•  Portfolio  Construction  is  a  form  of  ensemble  learning:  the  use  of  multiple 
algorithms  to  solve  the  current  problems,  and  to  predict  future  algorithm  success 
for  similar  problems.  Optimization  occurs  at  the  algorithm  selection  level,  where 
an  automated  system  will  run  one  or  more  planners  from  a  library  of  planners  to 
solve  a  planning  problem.  The  selection  and  configuration  of  planners  depend  on 
how  well  suited  each  technique  is  estimated  to  be  by  the  system  for  the  problem 
domain  at  hand.  Like  macro  action  learning,  portfolio  learning  requires  significant 
training  to  construct  an  accurate  portfolio  of  each  domain/ problem  set.  Portfolio- 
based  planners  can  however  leverage  distributed  computing  architectures  by 
scheduling  different  planners  on  different  processing  units  to  run  concurrently. 

The  main  advantage  of  adding  machine  learning  to  automated  planners  is  increased 
automation,  reducing  the  work  a  human  operator  would  need  to  do  in  planner 
reconfiguration,  helping  the  planner  adapt  to  the  problems  encountered  in  a  domain  at 
run  time.  However,  machine  learning  carries  some  costs,  most  notably  the  training  aspect. 
For  maximum  effectiveness,  the  training  set  must  be  representative  of  the  planning 
problems  that  will  be  encountered.  If  this  training  is  not  available  upfront,  then  the 
planner  must  be  able  to  continue  to  learn  over  time. 


3.3.5  Other  Approaches 

Many  more  planning  techniques  exist,  however  detailed  discussion  lies  outside  the  scope 
of  this  note  for  a  variety  of  reasons.  Firstly,  some  have  been  superseded  by  one  of  the 
techniques  discussed  earlier.  This  is  particularly  true  in  terms  of  state-of-the-art 
performance,  as  will  be  shown  later.  Secondly,  they  are  unsuited  to  the  automation  of  CRT 
planning  due  to  further  abstractions  needed  to  model  and  make  these  problems  solvable. 
Such  abstractions  may  render  the  generated  attack  plans  meaningless  or  useless  for 
vulnerability  assessment  and  mitigation  planning.  Below  is  a  selection  of  four  of  the  more 
interesting  of  such  techniques. 

•  Plan-Space  Planning  [51]  is  a  graph  traversal  approach  similar  to  state-space 
planning.  However  nodes  represent  partial  plans  instead  of  system  state,  and  arcs 
represent  plan  refinement  operations  such  as  adding  or  removing  actions  from  the 
plan.  The  algorithm  generally  starts  from  an  initial  plan  that  contains  flaws,  with 
the  goal  being  plan  refinement  through  flaw  elimination.  Compared  to  state-space 
planners,  plan-space  planners  are  computationally  inefficient,  and  there  is  no 
domain-agnostic,  systematic  way  to  construct  the  initial  plan. 

•  Planning  as  Satisfiability  (or  SAT  planning)  [52]  encodes  the  planning  problem  as 
a  Boolean  satisfiability  problem,  then  solves  it  using  stochastic  local  search 
algorithms.  SAT  planning  is  popular  for  static  systems  such  as  electronic  design 
automation,  but  is  not  applicable  in  dynamic  systems  like  the  problems  faced  in 
CRT. 

•  The  Markov  Decision  Process  (MDPs)  [53]  is  a  mathematical  structure  for 
representing  actions  in  planning  domains  where  the  effects  are  non-deterministic. 
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The  output  of  solving  an  MDP  is  a  partial  order  plan  containing  conditional  actions 
based  on  the  observed  effect  from  previous  actions.  There  is  current  research  into 
solving  Partially  Observable  MDPs  (POMDPs)  for  advanced  applications  in 
penetration  testing  [54],  but  solving  MDPs  has  been  shown  to  be  intractable. 

•  Model  Checking  [55]  is  similar  to  SAT  planning,  but  a  custom  planning  model  is 
created  in  order  to  determine  whether  a  valid  plan  exists  in  theory  before 
attempting  to  generate  one.  Majority  of  planners  that  do  not  follow  the 
aforementioned  approaches  generally  fall  under  this  category. 

Aside  from  planning  techniques  listed  above,  a  wide  variety  of  hybrid  approaches  exist, 
some  of  which  are  highly  engineered  to  solve  specific  classes  of  planning  problems. 


The  table  below  summarises  the  techniques  are  best  suited  for  particular  types  of  planning 
problems  regardless  of  the  domain  they  come  from. 


Problem  trait 

State-space 

planners 

Planning 

Graphs 

HTNs 

Plan-space 

planners 

SATPlan 

MDPs 

Action  library  size 

Small 

Medium 

Large 

Medium 

Small 

Small 

Observability 

Full 

Full 

Full 

Full 

Full 

Partial 

Determinism 

Deterministic 

Deterministic 

Deterministic 

Deterministic 

Deterministic 

Probabilistic 

Dynamic  states 

Yes 

Maybe 

No 

No 

No 

Yes 

Plan  Optimality 

Any 

Any 

Any 

Satisficing 

Satisficing 

Satisficing 

Plan  Ordering 

Any 

Any 

Totally 

Ordered 

Partially 

Ordered 

Totally 

Ordered 

Totally 

Ordered 

Preference 

handling 

Supported 

Supported 

Add-on 

Supported 

Supported 

Supported 

Constraint 

handling 

Yes 

Yes 

Yes 

Yes 

Yes 

No 

3.4  Benchmarking  via  the  International  Planning  Competition 

While  describing  the  various  fundamental  planning  approaches  above,  a  number  of 
assertions  were  made  with  respect  to  their  success  rate  in  solving  planning  problems  as 
well  as  how  efficiently  they  did  so.  The  quantitative  comparisons  were  largely  based  on 
results  from  the  International  Planning  Competitions  (IPCs)  [56]  [57]  [15]  [58]  [59]  [60]  [61] 
[62]  [20]  [63]  [64], 

The  IPC,  which  is  run  in  conjunction  with  the  International  Conference  on  Automated 
Planning  and  Scheduling  (ICAPS),  has  been  benchmarking  automated  planners  since  1998 
in  an  attempt  to  quantitatively  measure  the  current  state  of  the  art. 
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I  PC-2000  (Bacchus) 
-  Hand-Coded  Track 


I  PC-2004:  Uncertainty 
(Littman  &  Younes) 

.  WOOL 

•  Fully  Observable  Probabilistic 
Track  (MOP) 

I  PC-2006:  Uncertainty 
(Bonet  &  Givan) 

Non-Observable 
Non-Determlnistic 
Track  (Conformant) 

I  PC-2008:  Uncertainty 
{Buffet  &  Bryce) 

-  Fully  Observable 
Non  Deterministic  Track 

IPC  2011:  Uncertainty 
(Sanner  &  Yoon) 

-  POOL  < Compilation  ' 
to  PPDDL  Provided) 

-  Partiaily-Observable 
Probabilistic  Track 
(POMDP) 


IPC  1998  (McDermott) 

-  POOL  1.0.  Introduction  of 
Standard  Language 

-  STRIPS/ADL  Planning 


PC  2002  (Fox  &  Long) 

POOL  2.1.  Temporal  Modelling 
Temporal  Track 
Last  Hand-Coded  Track 
VAL  Automated  Plan  Validator 

I  PC-2004  (Hoffmann 
&  Edelkamp) 

-  POOL  2.2  Timed  Initial  Literals  &  Axioms 
•  Optimal  Track 


Tracks 

Classical  (Satisficing) 
Hand  Coded 
Temporal 
Optimal 

Preferences/Net  Benefit 

MDP 

Learning 

Multi  Core 

Conformant 

Fully  Observable  Non-D. 
POMDP 

Knowledge  Engineering 


ICKEPS-2005 
(Bartak  & 
McCluskey) 


IPC-2006  (Gerevini.  Saetti. 

Haslum  &  Dimopoulos) 

-  POOL  3.0:  Preferences  (preferences  track) 

-  SWrt  of  focus  to  Plan  Quality  Metrics 

IPC  2008  (Do.  Helmert  &  Refanidis) 

New  Formally  Defined  Scoring  Metrics 
PDDL  3.1:  Object  Fluents  &  Action  Costs 
■  Preferences  Becomes  Net  Benefit  Track 
IPC  2008:  Learning 
(Fern,  Khardon  &  Tepalli) 

PDDL  1.0  Strips  Domains 
•  Learn  to  Find  Plans  Faster 

#IPC-2011:  Learning 
(Jimenez,  Coles  &  Coles) 

•  Quality  and  Time  Metrics 
-  Pareto  Dominance  Criterion 


ICKEPS-2007 
(Edelkamp 
&  Frank) 

ICKEPS  2009 
(Bartak,  Fratini 
&  McCluskey) 


ICKEPS-2012 
(Vaquero 
&  Fratini ) 


Figure  2:  Historical  Tracks  of  the  IPC 


As  shown  in  Figure  2,  a  variety  of  competition  tracks  for  different  problem  classes  have 
emerged  over  time,  but  only  the  classical  track  has  run  for  every  competition.  The  classical 
track  also  consistently  receives  the  most  entries.  The  CRT  domain  contains  problems  that 
can  belong  to  all  these  classes,  but  the  largest  subset  does  reside  in  the  classical  track.  For 
this  reason,  the  state-of-the-art  will  be  more  evident  in  this  track. 

Figure  3  shows  the  success  rates  of  classical  planners  by  year.  The  best  performing 
implementation  of  each  major  planning  approach  was  chosen  to  represent  its  respective 
sub-discipline.  When  interpreting  the  graph,  it  is  important  to  note  that  each  competition 
was  different.  For  example,  running  time  constraints  varied  between  10  minutes  to  2 
hours,  the  complexity  of  the  problem  sets  changed  drastically  from  year  to  year,  and  given 
the  span  of  13  years  computational  resources  have  also  increased  significantly. 

This  makes  it  difficult  to  quantify  exactly  how  much  better  a  planner  with  90%  coverage  is 
compared  to  one  with  85%  coverage,  given  that  those  additional  5%  could  be  from  the 
hardest  problems.  However  the  competition  organisers  have  described  the  problems  set 
each  competition  to  be  pushing  the  field's  state-of-the-art,  thus  consistent  performance  is  a 
strong  indicator  for  candidacy.  For  the  more  recent  competitions,  planners  employing 
planning  graphs  appear  to  be  the  most  successful  in  terms  of  problem  coverage,  followed 
by  State-Space  planners  and  HTN  planners. 
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Planner  Coverage  of  IPC  domains/problems 


-♦—State-Space  Planning 
»  Plan-Space  Planning 
A  Planning  Graph 
*  Planning  as  Satisfiability 
— ♦—  HTNs 
— ♦—  Mode  Fchecking 


Figure  3  -  Performance  of  various  planning  techniques  based  on  problems  solved 


In  a  separate  analysis  of  the  learning  track,  it  was  shown  that  machine  learning  does 
improve  the  performance  of  planners  [65].  However  significant  training  was  needed  to 
achieve  the  improvement,  and  in  general  the  training  was  domain-specific.  This  means 
that  policies,  macro  actions  and  portfolios  learned  for  one  domain/ problem  class  could 
not  be  reused  in  another  domain.  Machine  learning  can  still  benefit  planning  for  CRT  in 
the  long  run,  provided  that  the  training  set  includes  sufficient  examples  of  all  types  of 
cyber-attacks,  and  supplementary  training  is  conducted  when  the  CRT  problem  evolves. 

The  IPC  benchmarks  provide  a  strong  indicator  of  general  strength  of  planning 
techniques,  and  the  test  problems  used  are  increasingly  designed  to  mimic  real  problems 
faced  by  industry.  However,  there  is  limited  participation  by  companies  that  have  built 
proprietary  planning  software,  therefore  the  IPC  alone,  and  indeed  the  academic  literature 
surveyed  for  this  report,  does  not  necessarily  capture  the  complete  state  of  the  art. 

The  IPC  also  favours  planners  that  fare  well  across  multiple  domains  and  problem  classes, 
which  promotes  good  general  purpose  tools  rather  than  specialized  solutions. 
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4.  Automated  Cyber  Red  Teaming 

Now  that  we  have  discussed  CRT  issues  as  well  as  had  an  overview  of  automated 
planning,  we  discuss  how  best  to  model  CRT  as  a  planning  problem  and  provide 
suggestions  of  tools  and  frameworks  that  may  be  suitable  for  automating  the  exercise. 


4.1  Cyber  Red  Teaming  as  a  Planning  Problem 

As  mentioned  in  Section  3,  automated  planning  has  been  used  in  computer  game  engines. 
Modern  computer  and  video  games  often  use  automated  planners  for  scripting  Non- 
Player  Character  (NPC)  behaviour  [14],  particularly  when  it  plays  an  adversarial  role.  It 
has  been  shown  that  NPCs  using  automated  planners  are  able  to  make  tactical  and 
strategic  decision  with  better  outcomes  than  expert  human  players  could  [66]. 

This  ability  to  simulate  intelligent  autonomous  adversaries  like  the  NPCs  in  the  game 
engine  means  that  automated  planners  may  lend  itself  well  to  aspects  of  CRT  since  the  two 
problems  are  very  similar.  Other  cyber  security  activities  including,  but  not  limited  to, 
vulnerability  risk  assessment  and  mitigation  planning  could  also  be  performed  with 
automated  planning  tools  as  well.  However  for  this  report,  the  attack  plan  drafting  in  CRT 
is  the  primary  focus. 

CRT  is  also  by  nature  a  planning  problem.  The  entities  within  the  network  environment 
are  the  objects,  and  the  relationships  between  them  as  well  as  their  attributes  are  the 
prepositions.  The  TTPs  need  to  be  encoded  as  plan  actions  with  the  appropriate 
dependencies  and  constraints,  where  the  action  effects  reflect  changes  to  the  entities  in  the 
target  network. 


4.2  CRT  Planning  Domain  characterisation 

Using  the  list  of  planning  problem  traits  from  Section  3.2,  here  are  the  assumptions  and 
abstractions  suggested  for  conducting  a  CRT  exercise. 

4.2.1  Observability 

Automated  CRT  is  likely  to  be  simpler  if  the  domain  is  treated  as  a  fully  observable 
system,  such  that  everything  potentially  needed  for  constructing  attack  plans  is  known 
and  available.  There  are  several  reasons  for  approaching  the  problem  this  way: 

•  If  the  real  network  is  only  partially  observable  to  all  entities,  valid  attack  plans  may 
exist  but  what  is  known  and  available  to  the  red  team  may  not  be  sufficient  to 
construct  them.  In  such  cases  it  would  still  be  better  to  assume  what  the  red  team 
partially  observes  is  the  full  system,  so  that  the  more  efficient  classical  planners  can 
be  used.  This  is  acceptable  as  the  conclusion  drawn  regarding  viable  attack  options 
would  be  the  same  from  an  adversary's  perspective. 
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•  From  an  efficiency  perspective,  removing  irrelevant  objects  and  prepositions  from 
the  planning  process  will  insulate  cognitive  overload  of  the  planner.  If  such 
inferences  can  be  made  with  a  good  performance  trade-off  for  planner 
performance,  then  do  so.  However,  where  such  problem  reductions  are  not 
possible,  it  is  generally  safer  to  overestimate  the  knowledge  that  an  adversary  has 
to  construct  a  plan  rather  than  underestimating  it.  This  ensures  that  all  possible 
attack  vectors  are  considered  at  least  once. 

•  Real  networks  use  technologies  such  as  proxies  and  firewalls  to  hide  its 
architecture  from  outsiders,  which  some  adversaries  may  find  difficult  to 
overcome.  However,  if  we  factor  in  alternate  reconnaissance  vectors  such  as  social 
engineering,  it  is  best  to  assume  an  adversary  can  still  gain  full  network  visibility. 

CRT  on  fully  observable  systems  not  only  lead  to  a  more  complete  set  of  attack  plans,  but 
also  decouples  the  planning  work  from  the  analysis  of  attack  plan  cost,  risk  and  impact 
analysis  depending  on  the  adversary.  The  latter  profile  can  then  be  used  for  a  more 
efficient  triage  process.  For  these  reasons,  modelling  the  CRT  exercise  as  a  fully  observable 
planning  problem  will  suffice  for  the  first  cut. 

4.2.2  Determinism 

In  practice,  many  attacks  may  fail  or  have  unintended  side  effects.  For  example,  a  phishing 
email  may  not  arrive  at  a  target's  inbox  due  to  being  blocked,  or  the  target,  upon  receiving 
the  email,  chooses  not  to  open  the  attachment  containing  the  malicious  payload  and 
forwards  the  email  to  their  network  admin.  A  realistic  model  of  CRT  planning  is  non- 
deterministic,  as  repeat  execution  of  the  same  attack  plan  may  have  different  outcomes 
even  if  the  system  state  remains  the  same. 

However,  modelling  non-determinism  is  problematic,  as  the  distribution  of  possible 
outcomes  for  each  type  of  attack  is  generally  not  known.  Learning  each  attack's  failure  rate 
is  a  separate  and  time-consuming  process  that  blocks  the  main  CRT  exercise.  It  is  also 
difficult  due  to  the  context  sensitivity  of  these  learned  distributions,  which  may  make  a  set 
of  considered  side  effects  in  one  setting  inapplicable  in  another. 

As  such,  we  suggest  that  every  action  the  adversary  takes  is  assumed  to  be  deterministic 
and  is  always  successful.  This  assumption  allows  the  use  of  more  planning  techniques,  as 
only  MDPs  are  currently  effective  at  solving  non-deterministic  planning  problems.  Also,  it 
is  better  to  have  false  positives  than  false  negatives  in  CRT. 

4.2.3  State  Dynamics 

A  given  cyber  environment  is  likely  to  contain  both  static  aspects  (system  architecture, 
geography  etc.)  and  dynamic  aspects  (staffing  arrangements,  network  packet  flows).  Even 
if  every  action  is  always  successful,  state-changing  events  independent  of  the  attacker's 
actions  may  affect  the  success  of  the  rest  of  the  attack  in  the  new  system  state.  For  instance, 
if  a  user  is  updating  a  vulnerable  version  of  the  software  that  was  intended  to  be  an 
adversary's  target,  the  malicious  payload  may  arrive  after  the  patch  has  already  been 
applied,  thus  rendering  it  ineffective. 
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As  it  is  hard  to  model  non-determinism  of  attack  outcomes  [55],  it  is  also  hard  to  model  the 
system  dynamics.  Because  of  this,  in  the  context  of  a  single  planning  run,  assuming  a  static 
environment  is  recommended.  This  can  be  either  in  the  form  of  a  full  snapshot  of  the 
current  system,  or  removing  actions  that  depend  on  dynamic  elements  prior  to  planning. 
This  will  simplify  the  planning,  but  may  include  attack  plans  that  only  work  in  limited 
cases,  and  also  omit  ones  that  could  have  succeeded. 

To  allow  for  and  accommodate  state  dynamics  in  the  planning  process,  there  are  several 
approaches  one  may  consider.  One  is  to  favour  plans  that  depend  less  on  prepositions  and 
objects  that  are  tagged  as  dynamic,  which  can  be  machine  learned  through  training  or 
manually  determined  by  a  human  domain  expert.  Another  approach  is  to  create  robust 
attack  plans  with  contingencies  [67],  which  can  be  identified  by  frequent  re-planning  on 
the  latest  state  of  the  system  and  seeing  which  attack  vectors  remains  valid  regardless  of 
system  dynamics.  In  both  cases,  it  makes  the  planning  problem  larger,  and  the  increased 
robustness  of  attack  plans  discovered  may  come  at  the  cost  of  significantly  added 
computation  time. 

4.2.4  Time  and  Resource  Constraints 

In  the  real  network  environment,  both  the  adversary  and  their  target  have  resource 
limitations;  a  small  attack  team  with  a  dozen  laptops  will  have  difficulty  orchestrating  a 
successful  Distributed  Denial  of  Service  attack  on  an  organization  like  Google,  but  be 
sufficient  to  take  down  a  small  business's  website  with  budget  hosting.  Some  attacks  may 
also  require  taking  advantage  of  certain  time  windows  such  as  between  the  announcement 
of  a  new  software  patch,  and  the  system  administrator  installing  the  patch  on  the  target 
machines.  Moreover,  such  attacks  would  only  be  possible  if  this  window  of  vulnerability 
[68]  and  the  exploit  method  are  known  and  accomplishable  given  the  attacker's  resources. 

For  encoding  the  CRT  exercise  as  a  planning  problem,  it  is  recommended  that  these 
resource  constraints  are  not  a  factor  in  the  validity  of  the  attacker's  plan.  The  quality  and 
validity  of  attack  plans  under  constraint  can  be  quickly  checked,  archived  and  ranked  once 
it  is  found. 

4.2.5  Optimality  and  Ordering 

Complex  attacks  that  involve  multiple  entities  in  a  Cyber  environment  can  usually  be 
conducted  in  various  ways.  For  instance,  you  could  attempt  to  steal  a  user's  email  account 
to  get  at  their  other  email-verified  accounts,  or  you  could  forge  and  send  a  password  reset 
email  to  trick  the  owner  into  providing  these  details  on  a  phishing  site.  There  may  also  be 
multiple  attack  plans  that  can  be  used  to  achieve  the  same  goal,  and  in  reality  the 
resources  an  adversary  has  is  limited,  so  they  would  most  likely  try  the  most  cost-effective 
attacks  first. 

For  initial  development,  we  are  more  concerned  with  attack  possibility  and  impact  than 
attack  efficiency,  so  we  do  not  need  a  planner  to  generate  or  guarantee  an  optimal  plan.  As 
long  as  it  generates  valid  attack  plans  in  reasonable  time,  and  the  plans  are  totally  ordered, 
contingencies  can  be  appended  onto  these  plans  if  non-determinism  is  later  introduced  or 
modelled  for  the  same  problems. 
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4.2.6  Preference  and  Constraints 

We  assume  that  the  adversary  has  resource  limitations,  and  therefore  cannot  conduct 
attacks  beyond  their  means.  As  such,  action  costs  also  need  to  be  modelled  and  factored 
into  the  planning  process.  If  records  of  past  attacks  by  the  adversary  exist,  their 
preferences  in  how  they  attack  are  also  worth  providing  to  the  planner  if  available. 


4.3  Candidate  Tools 

We  have  characterised  and  abstracted  the  CRT  domain  as  a  classical  planning  problem. 
Based  on  the  information  from  the  planner  capability  table,  as  well  as  the  IPC  results  from 
Section  3,  planning  graph  planners  with  parameter  tuning  or  portfolio-based  learning 
appear  to  be  the  most  suitable  tool  for  CRT  exercise  planning  and  execution. 

Below  is  a  list  of  planning  tools/frameworks  that  use  planning  graphs,  and  either  contain 
learning  capabilities  or  can  be  easily  extended  to  incorporate  a  learning  step.  They  were 
selected  from  a  larger  pool  of  planning  graph  planners  based  on  their  individual 
performance  in  the  more  recent  IPCs. 

4.3.1  LAMA 

Developed  by  Silvie  Richter  from  NICTA,  the  LAMA  system  is  part  of  the  latest  generation 
of  heuristics-based  forward  searching  techniques  with  planning  graphs  [69]  [70].  LAMA 
identifies  landmarks,  states  that  valid  plans  must  go  through,  to  decompose  the  problem, 
and  arrive  at  valid  plans  much  faster  than  purely  heuristic-driven  end-to-end  approaches. 
It  was  the  best  performing  classical  planner  in  both  the  6th  and  7th  IPC,  and  is  considered 
the  state-of-the-art  general  purpose  planner. 

4.3.2  FD-Autotune 

Developed  by  a  team  at  University  of  Huddersfield  [16],  FD-Autotune  is  a  machine 
learning  variant  and  extension  to  Helmert's  Fast  Downward  algorithm  [71],  which  uses 
parameter  tuning  and  macro  action  learning  to  supplement  a  hybrid  planning  engine.  This 
engine  uses  planning  graphs  to  reduce  the  search  space,  and  features  automated  HTN 
construction  in  the  pre-processing  stage  which  can  make  plan  drafting  faster  than 
traditional  approaches. 

The  learning  engine  deployed  is  the  automated  algorithm  configuration  tool  ParamILS 
[72]  and  the  HAL  experimentation  environment  [73],  which  profiles  a  particular  problem 
using  training  examples,  and  optimizes  parameters  for  various  heuristic  configurations 
used  to  guide  the  search.  FD-Autotune  also  manages  a  library  of  heuristics  that  through 
learning,  is  able  to  automatically  select  the  most  suitable  heuristic(s)  for  a  given  domain 
and  problem  type.  FD-Autotune  can  optimize  the  planning  for  either  planning  speed 
(generate  satisficing  plans  really  fast)  or  plan  quality  (based  on  red  team  preferences). 

The  learning  aspect  of  FD-Autotune  is  likely  to  be  robust  as  it  uses  established  tools  for  the 
parameter  tuning.  Combined  with  the  FD  algorithm,  it  has  achieved  solid  benchmark 
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performance  at  the  recent  IPCs  when  planning  for  speed,  though  the  speed  increase  when 
optimizing  for  plan  quality  makes  it  comparably  slower  to  other  speed-driven  planners. 

4.3.3  PbP2 

Developed  by  a  team  at  the  University  of  Brescia  [50],  the  Portfolio-based  Planner  (PbP)  is 
an  ensemble  learning  planner.  It  constructs  a  portfolio  for  each  domain  to  determine 
which  planners  in  its  library  are  most  suitable  to  complete  the  planning.  Additionally,  PbP 
constructs  distinct  macro  actions  for  every  planner  in  the  portfolio,  allows  for  automated 
parameter  tuning,  and  generates  a  promising  configuration. 

The  latest  release  of  PbP  (version  2)  won  the  overall  learning  track  of  IPC-2011,  and  in 
terms  of  overall  plan  success,  outperformed  all  planners  in  the  classical  track,  including 
LAMA.  It  is  however  more  resource-intensive  due  to  the  training  component,  but  the 
newest  stable  release  feature  a  distributed  architecture,  permitting  concurrent  scheduling 
of  multiple  planners  for  training  or  planning  work. 

4.3.4  Conclusion 

With  respect  to  CRT,  LAMA  appears  to  be  the  most  suitable  for  quick  deployment  since  it 
has  the  highest  success  rate  of  the  classical  planners,  and  doesn't  require  training  for 
operational  use.  FD-Autotune  and  PbP2  may  be  more  valuable  in  the  long  run  if 
conducting  CRT  is  part  of  the  organization's  business  process,  as  continuous  (offline) 
learning  will  improve  the  planner's  understanding  of  what  attacks  are  most  relevant  to  the 
organization  deploying  it. 
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5.  Recommendations  and  Future  Work 


This  report  discussed  Cyber  Red  Teaming,  what  is  involved  with  the  modelling  of  each 
problem,  attack  plan  generation  and  associated  challenges.  It  has  also  introduced  the 
fundamental  and  state-of-the-art  theories  and  techniques  in  automated  planning,  analysed 
performance  of  various  techniques  from  the  IPC,  and  recommended  the  most  suitable 
tools  based  on  results  as  well  as  compatibility  to  the  CRT  problem. 

Our  recommendation  is  that  planning  graph  planners  are  the  most  suitable  approach  for 
CRT,  especially  for  real-time  deployment  on  sensitive  systems  due  to  its  scalable 
performance  and  measured  success  rate.  Furthermore,  if  CRT  needs  to  be  conducted 
frequently,  planner  implementations  that  incorporate  machine  learning  will  be  even  better. 
The  tools  we  recommend  for  trialling  are  LAMA,  FD- Autotune  and  PbP2,  as  they  have 
been  shown  to  perform  well  above  their  peers. 

From  here,  two  steps  are  possible.  The  first  would  be  to  set  up  and  conduct  CRT  on  a  large 
organisation's  computing  environment  using  one  of  these  tools,  in  order  to  verify  and 
more  accurately  measure  the  benefit  they  bring  compared  to  traditional  hands-on 
approaches  to  CRT.  The  other  step  would  be  to  identify  specialized  planning  tools  for 
organizational  CRT,  and  study  their  capabilities  and  performance. 

Automated  Planning  is  a  well-established  field  of  research,  but  its  application  on  Cyber 
Red  Teaming  is  relatively  untouched,  and  deserves  further  exploration  through 
collaboration  between  experts  from  both  fields. 

Finally,  the  CRT  problem  is  evolving:  organizations  may  switch  to  using  cloud 
infrastructure,  business  policies  may  change  allowing  staff  to  bring  in  unvetted  (thus 
unmodelled)  personal  electronics  into  the  workplace,  or  major  incidents  may  change  the 
level  of  fidelity  of  mitigation  plans  required. 

Therefore,  on  top  of  trialling  planners  for  CRT,  continuous  monitoring  of  new  research 
and  other  benchmarking  results  collected  by  academia  and  industry  is  highly 
recommended.  This  will  help  ensure  the  state-of-the-art  is  used  in  Automated  CRT  to  get 
best  performance  and  best  results  possible. 
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Appendix  A:  Planners'  Capabilities 

All  planners  mentioned  in  or  studied  as  part  of  this  report  are  listed  below,  ordered 
alphabetically.  Several  iterations  of  the  same  planner  may  appear  if  they  each  offer 
significantly  different  capabilities.  Noteworthy  planners  have  been  highlighted  in  green. 

Citation  count  refers  to  the  total  number  of  other  books,  papers  and  journal  articles  that 
have  cited  the  main  paper(s)  describing  the  planner4.  This  provides  some  indication  of  the 
impact  and  influence  each  planner  has  within  the  academic  community  (ignoring  time). 


Planner 

Source  Code 

Citation  count 

Plan  Output 

Planning  Technique 

Determinism 

Observability 

Preference  anchor 
Quality-driven 
planning? 

Handles  Dynamic 
States  (Contingent 
Planning)? 

land-coded  Control 
Knowledge? 

Machine  Learning 

PDDL  Compliance 

AltAlt  [40] 

38 

SP 

PG 

D 

F 

- 

- 

- 

- 

1.0 

APPL  [74] 

[75] 

196 

ST 

POMDP 

C 

P 

** 

ArvandHerd  [76] 

9 

- 

- 

- 

Yes 

3.0 

BDDPlan  [77] 

26 

OT 

MC 

D 

F 

- 

- 

- 

- 

1.0 

BLACKBOX  [78] 
[79] 

[52] 

691 

OT 

SAT 

D 

F 

- 

- 

- 

- 

1.0 

O  [80] 

9 

ST 

PG 

D 

F 

- 

- 

- 

- 

3.0 

Conformant-FF 

[27] 

[81] 

169 

ST 

PG 

N 

P 

Yes 

Yes 

- 

- 

2.1 

CPT  [82] 

162 

OP 

PS 

D 

F 

- 

- 

- 

- 

2.2 

DAEYAHSP  [83] 
[84] 

28 

ST 

PG 

D 

F 

Yes 

- 

- 

- 

3.0 

DTG  [85] 

13 

ST 

PG 

D 

F 

Yes 

- 

- 

- 

Fast  Downward 
[71] 

[71] 

401 

ST 

PG/HTN 

D 

F 

-■ 

- 

- 

Yes 

2.2 

FCPlanner  [53] 

22 

FD-Autotune  [16] 

[86] 

9 

ST 

PG/HTN 

D 

F 

Yes 

Yes 

Yes 

3.0 

FDP  [87] 

18 

OT 

SAT/SS 

D 

F 

- 

- 

- 

- 

3.0 

FD-SS  [88] 

17 

FF  [28] 

[81] 

1290 

ST 

PG 

D 

F 

- 

- 

- 

- 

1.0 

FF(Ha)  [89] 

58 

ST 

PG 

D 

F 

FF-rePlan  [90] 

134 

ST 

PG 

F 

FPG  [49] 

60 

GAMER  [39] 

[91] 

11 

OT 

PG 

D 

F 

Yes 

3.0 

Glutton  [92] 

1 

GraphPlan  [36] 

[93] 

2008 

OT 

PG 

D 

F 

GRT  [94] 

61 

OT 

SS 

D 

F 

- 

- 

- 

- 

1.0 

HSP  [37] 

261 

ST 

SS 

D 

F 

- 

- 

- 

Yes 

1.0 

HSP2  [37] 

657 

ST 

SS 

D 

F 

Yes 

1.0 

IxTeT  [95] 

219 

OP 

HTN 

D 

F 

- 

Yes 

- 

- 

2.1 

LAMA  [69]  [70] 

[96] 

153 

PG 

2.2 

4  using  Google  Scholar  citation  database  results  on  30-Jul-2013 
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Planner 

Source  Code 

Citation  count 

Plan  Output 

Planning  Technique 

Determinism 

Observability 

Preference  and/or 
Quality-driven 
planning? 

Handles  Dynamic 
States  (Plan 
Contingency)? 

Hand-coded  Control 
Knowledge? 

Machine  Learning 

PDDL  Compliance 

LPG  [97] 

210 

SP 

PG 

D 

F 

Yes 

- 

- 

- 

2.1 

LPG-TD  [98] 

30 

SP 

PG 

D 

F 

Yes 

- 

- 

- 

2.2 

M&S  [99] 

24 

SS 

Macro-FF  [100] 

100 

ST 

PG 

D 

F 

- 

- 

- 

Yes 

2.2 

Marvin  [101] 

60 

ST 

PG 

D 

F 

- 

- 

- 

Yes 

2.2 

MaxPlan  [102] 

102 

O 

SS 

Metric-FF  [38] 

[81] 

293 

ST 

PG 

D 

F 

Yes 

2.1 

mGPT  [103] 

54 

MIPS  [104] 

81 

OT 

MC 

D 

F 

Yes 

- 

- 

- 

2.2 

NMRDPP  [105] 

26 

ST 

MDP 

P 

OptiPlan  [106] 

25 

OP 

PG 

D 

F 

- 

- 

- 

- 

2.2 

PbP  [50]  [107] 

[108] 

28 

OT 

Hybrid 

D 

F 

Yes 

Yes 

3.0 

PbR  [109] 

49 

ST 

SAT 

D 

F 

Yes 

- 

Yes 

Yes 

1.0 

PGP  [110] 

[42] 

127 

PG 

Plan-A  [111] 

9 

ST 

SAT 

D 

F 

Yes 

- 

- 

- 

3.0 

POMCP  [112] 

84 

POMDP 

PropPlan  [113] 

38 

OT 

PG 

D 

F 

- 

- 

- 

- 

1.0 

Sapa  [114] 

159 

SP 

SS 

D 

F 

Yes 

- 

- 

- 

2.1 

SATPlan  [115] 
[116] 

914 

OT 

SAT 

SGP  [117] 

318 

SP 

PG 

C 

P 

Yes 

1.0 

SGPlan  [118] 

53 

ST 

PG 

D 

F 

- 

- 

- 

- 

2.1 

SGPlan4  [119] 

[120] 

133 

ST 

PG 

D 

F 

- 

- 

- 

- 

2.2 

SGPlan5  [121]  [122] 

[120] 

28 

ST 

PG 

D 

F 

Yes 

- 

- 

- 

3.0 

SGPlan6  [123] 

26 

ST 

PG 

D 

F 

Yes 

- 

3.0 

SHOP  [43] 

391 

OT 

HTN 

D 

F 

- 

- 

Yes 

- 

- 

SHOP2  [124] 

585 

OT 

HTN 

D 

F 

Yes 

- 

STAN  [125] 

184 

ST 

PG 

D 

F 

- 

- 

- 

- 

1.0 

Symbolic  Heuristic 
Search  [126] 

91 

ST 

MDP/MC 

C 

F 

Yes 

- 

- 

- 

Symbolic  Perseus 
[127]  [128] 

[129] 

391 

ST 

POMDP 

C 

P 

Yes 

- 

- 

* 

System  R  [130] 

20 

OT 

SS 

D 

F 

- 

- 

Yes 

- 

1.0 

To  [21] 

[131] 

57 

TALPlanner  [132] 

124 

ST 

SS 

D 

F 

- 

- 

- 

- 

1.0 

TLP-GP  [133] 

10 

ST 

SAT 

D 

F 

- 

- 

- 

- 

3.0 

TLPlan  [134] 

469 

OT 

SS 

D 

F 

- 

- 

- 

- 

2.1 

TP4  [135] 

164 

OT 

SS 

D 

F 

Yes 

- 

- 

- 

2.1 

UCPOP  [51] 

846 

OP 

PS 

VHPOP  [136] 

121 

OP 

PS 

D 

F 

- 

- 

- 

- 

2.1 

Wizard  [137] 

[138] 

46 

YAHSP2-MT  [139] 

7 

ST 

SS 

{Plan  output  types:  OT  =  Optimal,  Total-Order  |  OP  =  Optimal,  Partial-Order  |  ST  =  Satisficing,  Total-Order  |  SP  = 
Satisficing,  Partial-Order} 

{Planning  technique  used:  SS  =  State-space  |  PS  =  Plan-space  |  PG  =  Planning  Graph  |  SAT  =  Planning  as  Satisfiability  | 
HTN  =  Hierarchical  Task  Networks  |  MC  =  Model  Checking  |  MDP  =  Markov  Decision  Process  solver  |  POMDP  =  Partial 
Observable  MDP  solver  |  ML  =  Machine  Learning  |  Hybrid  techniques  will  list  all  planner  technique  that  is  used  } 
{Determinism:  D  =  Deterministic  |  P  =  Probabilistic  |  C  =  Conformant} 

{Observability:  F  =  Fully  Observable  |  P  =  Partially  Observable} 
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